Normalized Compression Distance Based Measures for MetricsMATR 2010
نویسندگان
چکیده
We present the MT-NCD and MT-mNCD machine translation evaluation metrics as submission to the machine translation evaluation shared task (MetricsMATR 2010). The metrics are based on normalized compression distance (NCD), a general information theoretic measure of string similarity, and evaluated against human judgments from the WMT08 shared task. The experiments show that 1) our metric improves correlation to human judgments by using flexible matching, 2) segment replication is effective, and 3) our NCD-inspired method for multiple references indicates improved results. Generally, the proposed MT-NCD and MT-mNCD methods correlate competitively with human judgments compared to commonly used machine translations evaluation metrics, for instance, BLEU.
منابع مشابه
Normalized Information Distance is Not Semicomputable
Normalized information distance (NID) uses the theoretical notion of Kolmogorov complexity, which for practical purposes is approximated by the length of the compressed version of the file involved, using a real-world compression program. This practical application is called ‘normalized compression distance’ and it is trivially computable. It is a parameter-free similarity measure based on comp...
متن کاملNew Similarity Measures of Fuzzy Soft Sets Based on Distance Measures
Similarity measure is a very important problem in fuzzy soft set theory. In this paper, seven similarity measures of fuzzy soft sets are introduced, which are based on the normalized Hamming distance, the normalized Euclidean distance, the generalized normalized distance, the Type-2 generalized normalized distance, the Type-2 normalized Euclidean distance, the Hausdorff distance and the Chebysh...
متن کاملChapter 3 Normalized Information Distance
The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wid...
متن کاملNormalized Compression Distance as automatic MT evaluation metric
This paper evaluates a new automatic MT evaluation metric, Normalized Compression Distance (NCD), which is a general tool for measuring similarities between binary strings. We provide system-level correlations and sentence-level consistencies to human judgements and comparison to other automatic measures with the WMT’08 dataset. The results show that the general NCD metric is at the same level ...
متن کاملPerceptual Normalized Information Distance for Image Distortion Analysis Based on Kolmogorov Complexity
Image distortion analysis is a fundamental issue in many image processing problems, including compression, restoration, recognition, classification, and retrieval. In this work, we investigate the problem of image distortion measurement based on the theories of Kolmogorov complexity and normalized information distance (NID), which have rarely been studied in the context of image processing. Bas...
متن کامل